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In  this  paper  we  consider  the  numerical  solution  of  stiff  initial  value  problems,  which  lead 
to  the  problem  of  solving  large  systems  of  mildly  nonlinear  equations.  For  many  problems 
derived  from  engineering  and  science,  a  solution  is  possible  only  with  methods  derived  from 
iterative  linear  equation  solvers.  A  common  approach  to  solving  the  nonlinear  equations  is 
to  employ  an  approximate  solution  obtained  from  an  explicit  method.  In  this  paper  we  shall 
examine  the  error  to  determine  how  it  is  distributed  among  the  stiff  and  non-stiff  components, 
which  bears  on  the  choice  of  an  iterative  method.  Our  conclusion  is  that  error  is  (roughly) 
uniformly  distributed,  a  fact  that  suggests  the  Chebyshev  method  (and  the  accompanying 
Manteuffel  adaptive  parameter  algorithm).  We  describe  this  method,  also  commenting  on 
Richardson’s  method  and  its  advantages  for  large  problems.  We  then  apply  Richardson’s 
method  and  the  Chebyshev  method  with 
nonlinear  equations  by  Newton’s  method. 


the  Manteuffel  algorithm  to  the  solution  of  the 
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1  Introduction 


The  numerical  solution  of  initial  value  problems  requires  at  each  time-step  the  solution  of  an 
implicit  nonlinear  equation,  usually  by  a  Newton-like  iteration.  As  a  result  it  is  necessary  to 
solve  a  system  of  linear  equations,  which  is  often  large  and  sparse.  It  has  been  customary 
to  use  direct  methods,  but  in  recent  years  studies  on  the  suitability  of  iterative  methods 
[18,  9,  3,  1]  have  been  emerging. 

This  paper  examines  some  issues  in  the  application  of  iterative  methods  to  initial  value 
problems  (IVPs).  First,  we  identify  more  clearly  the  nature  of  the  task  to  be  performed  by 
the  iterative  solver.  A  number  of  comments  have  been  made  about  the  question  of  whether 
the  predictor  error  lies  primarily  in  the  dominant  subspace.  These  comments  have  not  been 
convincingly  supported  by  analysis.  The  difficulty  with  the  analysis  of  stiff  equations  is  that 
of  knowing  which  quantities  are  large  and  which  are  small  —  these  things  change  during  the 
course  of  the  integration.  We  believe  we  have  found  a  satisfactory  way  of  dealing  with  this 
question  and  that  is  to  regard  as  small  those  quantities  which  the  error  control  mechanisms 
of  the  algorithm  make  small.  Second,  we  explain  why  the  Chebyshev  method  might  be  well 
suited  as  an  inner  iteration  to  solve  the  linear  equations  arising  at  each  step  of  the  Newton 
step.  In  addition  we  consider  an  application  of  the  first  order  Chebyshev  method  to  the 
outer,  Newton  iteration.  (The  first  order  Chebyshev  method  is  Richardson’s  method  with 
Chebyshev  parameters.) 

This  paper  is  a  preliminary  theoretical  study;  the  accuracy  and  usefulness  of  our  obser¬ 
vations  await  experimental  confirmation. 

1.1  Outline  of  the  Paper 

In  §2,  the  standard  time-stepping  approach  for  solving  linear  and  nonlinear  stiff  IVPs  is 
described  in  a  simplified  way.  When  the  IVP  is  nonlinear,  a  variant  of  Newton’s  method  is 
employed  at  each  time  step,  usually  the  modified  Newton’s  method  in  which  the  Jacobian 
is  not  always  up-to-date.  (It  will  be  convenient  to  say  Newton’s  method,  however,  rather 
than  either  modified  Newton’s  method  or,  as  is  also  used,  the  modified  inexact  Newton’s 
method.)  Each  Newton  step  requires  the  solution  of  a  linear  system  for  which  the  matrix  is  a 
Jacobian,  either  the  current  Jacobian  or  one  saved  from  a  previous  Newton  step.  (Matrix-free 
methods  avoid  explicit  computation  of  the  Jacobian  but  from  time  to  time  the  Jacobian  will 
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be  referred  to  as  though  it  had  been  computed  and  is  available.)  Assume  that  an  iterative 
method  is  used  to  solve  the  linear  system.  This  is  an  inner  iteration  whereas  Newton’s 
method  is  an  outer  iteration  in  a  combined  inner-outer  iteration. 

Iterative  methods  differ  in  the  ways  in  which  the  error  is  reduced.  In  §3,  the  error  in  the 
predictor  step  is  analyzed.  The  predictor  step  may  be  thought  of  as  the  initial  guess  for  the 
inner  iteration,  leading  to  the  discussion  in  §4  of  appropriate  (inner)  iterative  methods  for 
reducing  the  error  in  the  predictor. 

In  §5,  an  application  of  techniques  employed  in  the  Chebyshev  iterative  method  is  made 
to  the  outer  iteration  to  give  more  control  over  convergence. 

1.2  Terminology  and  Notation 

It  is  customary  to  discuss  the  solution  of  IVPs  only  in  the  real  case.  However,  to  take  into 
account  the  importance  of  complex  IVPs,  we  use  the  more  general  and  more  appropriate 
terms  hermitian  and  nonhermitian  rather  than  symmetric  and  nonsymmetric. 

The  Jacobian  of  functions  /  and  F  will  be  denoted  by  f  and  F'  respectively. 

The  computed  approximation  to  the  solution  of  an  IVP  at  step  n  will  be  denoted  by  yn. 
Usually,  yn  is  obtained  by  approximately  solving  a  nonlinear  equation,  the  exact  solution  of 
which,  when  clarity  is  required,  will  be  denoted  by  y*. 

2  Nonlinear  Iteration  in  IVPs 

Assume  a  transient  problem  y'(t )  =  /(y(i))  with  an  appropriate  set  of  initial  values.  (For 
convenience,  we  assume  the  system  is  expressed  in  autonomous  form.)  If  the  equation  is  stiff, 
the  standard  numerical  solution  methods,  based  on  Newton’s  method,  yield  linear  systems  to 
be  solved.  We  illustrate  with  backward  Euler.  There  is  little  reason  to  believe  that  the  ideas 
do  not  apply  to  other  implicit  difference  schemes.  For  backward  Euler,  the  approximation, 
yn,  to  y(tn)  at  tn  is  the  computed  solution  of 

Vn  =Vn-l  +  Kf{y*n)  (l) 

This  is  an  implicit  equation  for  the  unknown  y*.  Rearranging  (1)  slightly,  it  is  necessary  to 
solve  an  equation  of  the  form 

F(y'*)  ~  T-y*  -  /«)  -  r-s»-i  =  o  (2) 

tln  fln 
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The  solution  of  (2)  by  Newton’s  method  requires  the  solution  of  the  linear  systems 

-  S<T>)  =  -F(s<m)).  (3) 

Customarily  using  a  direct  method  one  iterates  with 

y^'^y^-cG-'FiyW)  (4) 

where  G  is  a  Jacobian  of  F  kept  either  from  an  old  time  step  or  from  an  earlier  iteration 
and  c  is  an  acceleration  parameter.  Since  G  is  a  matrix,  it  follows  that  computing  the  vector 
x  —  G~lF(y<sn'>)  is  equivalent  to  solving  a  set  of  linear  equations, 

Gx  =  F(y<r>).  (5) 

The  solution  of  the  nonlinear  equation  (2)  is  called  the  corrector  step,  and  the  initial 
approximation  of  y :=  y%  obtained  from  an  explicit  multistep  formula  is  called  the  predictor 
step. 

One  could  instead  approximate  the  solution  of  (3)  by  means  of  a  matrix-free  iterative 
method.  Such  methods  require  only  that  we  be  able  to  compute  the  action  of  the  Jacobian 
matrix  F\y^)v  for  an  arbitrary  vector  v.  This  can  be  accurately  approximated  by 

i  +  Sv)  -  *■(»<,“>)] 

for  some  small  6  of  the  order  of  the  square  root  of  the  machine  epsilon.  Note  that  F( y^) 
is  already  available,  because  it  is  the  right-hand  side  of  the  linear  system  (3).  Hence  each 
iteration  of  a  typical  linear  iterative  solver  requires  but  one  function  evaluation.  The  stopping 
criterion  for  the  inner  linear  iteration  would  be  based  on  that  of  the  outer  Newton  it  ciation. 

This  question  of  when  to  stop  the  inner  iteration  is  not  easy:  one  would  like  to  avoid 
unnecessary  inner  iterations  without  degrading  the  convergence  of  the  outer  iteration.  Hence 
it  has  been  proposed  [4]  that  a  single  nonlinear  iteration  would  be  more  efficient  than  a 
two-level  iteration.  However  the  one-level  iteration  proposed  in  [4]  requires  two  function 
evaluations  per  iteration;  whereas  the  two-level  iteration  requires  one  function  evaluation 
per  inner  iteration  and  one  per  outer  iteration. 

Currently  it  seems  more  practical  to  use  iterative  methods  to  solve  (5)  if  there  is  some 
preconditioning,  although  this  usually  requires  that  an  explicit  matrix  be  available. 
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3  Accuracy  Needed 


Much  discussion  has  focused  on  this  objective  of  the  nonlinear  solution  to  impose  stability 
[9,  3].  In  this  section,  we  present  some  analysis  to  show  what  components  of  the  error  should 
be  reduced  by  an  iterative  method 

Consider  now  the  backward  Euler  formula 

Vn  =  l/n-1  +  hnf(y  n)  +  hnFn  (6) 

where  the  residual  (per  unit  step)  Fn  F(yn)  would  be  zero  if  the  nonlinear  system  were 
solved  exactly. 

One  question  concerns  the  nature  of  the  error  in  the  predictor.  In  practice,  the  predictor 
is  usually 

2/n  Vn—  1  ~h  2  (?) 

where  the  double  subscript  denotes  a  first  divided  difference.  (Often  this  is  written  y £  = 
2/n-i  +  hny'n_ j  where  y'n_x  is  chosen,  not  to  satisfy  y„_1  =  /{Vn-i),  but  rather  to  satisfy 
the  formula  used  on  the  (n  —  l)-st  step.  If  this  formula  is  backward  Euler,  then  yn_i  = 
yn~2  +  /in-i24_i  determines  y'n_1.)  The  residual  F%  :=  F(y £)  for  the  predicted  value  satisfies 

ypn  =  yn-1  +  hnf{yl)  +  hnF%  (8) 

and  the  nature  of  this  residual  depends  on  how  the  stepsize  is  controlled.  It  is  customary  to 
control  the  stepsize  /in_!  so  that  ||/en_i||  <  e  and  to  choose  hn  sc  that  r£||/en_i||  <  (safety 
factor)  e,  where  e  is  the  local  error  tolerance,  Ie„_i  is  a  local  error  estimate,  and  rn  is  the 
stepsize  ratio  hn/hn_ For  the  local  error  estimate,  it  is  common  to  use 

l^n  =  hnyn,n-l,n-2  (9) 

where  the  triple  subscript  denotes  a  second  divided  difference.  What  is  wanted  is  an  expres¬ 
sion  for  the  predictor  residual  which,  as  much  as  possible,  is  in  terms  of  quantities  known  to 
be  small.  It  will  be  shown  at  the  end  of  this  section  that 


Fp  =  (1  +  rn)Fn_i  -  rnFn_2 

—  t — r^(  1  +  r~^1)letl_1  +  nonlinear  term 
where  the  nonlinear  term  is  given  in  a  later  paragraph. 


(10) 
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Let  y*  be  the  exact  solution  of  F(y„)  —  0.  Codes  attempt  to  control  the  error,  8n  = 
Vn  ~  Vn,  in  the  solution  typically  so  that  it  is  no  greater  than  some  fraction  <f>  of  the  local 
error  tolerance: 

IIM  <  ^e(hopefully). 

In  practice,  an  approximation  for  the  error  is  necessary,  and  usually  this  is  a  scalar  multiple 
of  —  y •  The  widely  distributed  package  DASSL[20]  uses  tj>  =  1/3.  We  now  want 

to  look  at  (10)  to  determine  what  the  iterative  linear  equation  solver  must  do.  To  simplify 
this  description,  neglect  the  nonlinear  term  as  well  as  variations  in  the  stepsize  and  in  the 
Jacobian  matrix  f  of  /.  Under  these  assumptions  the  error  8n  in  solving  for  yn  satisfies 

and  the  predictor  error  8%  satisfies 

*5  =  2 8n_!  -  8n_2  +  2(1  -  hf')~1fen_1.  (11) 

The  conclusion  is  that  the  nonstiff  error  must  be  reduced  by  as  much  as  1/(3  +  2</>_1)  (set 
hf  —  0)  and  the  stiff  error  by  as  much  as  1/3  (set  hf  =  — oo).  Trying  to  control  the  error 
in  ?n  iteration  based  on  the  size  of  the  correction  is  tricky  unless  the  convergence  is  rapid; 
thus,  DASSL  requires  a  convergence  factor  not  greater  than  0.9.  Therefore,  for  an  iterative 
method  one  might  want  to  consider  controlling  the  residual  instead  [25,  3].  The  price  for 
this  is  that  instead  of  reducing  the  error  by  1/(3  +  4>~x)  in  just  the  nonstiff  components, 
the  more  severe  reduction  must  be  done  for  all  components.  In  addition,  [5]  argues  that  the 
possible  skewness  of  the  eigensystem  of  /'  can  make  it  risky  to  control  the  residual. 

One  might  consider  loosening  up  on  the  fraction  <f>\  however,  the  stability  of  the  method 
[16]  and  the  reliability  of  the  local  error  estimation  depends  on  having  an  accurate  solution 
of  the  nonlinear  system.  Nonetheless,  this  matter  should  be  re-examined  because  of  its 
importance  to  the  efficient  use  of  iterative  (non)linear  equation  solvers. 

It  remains  to  discuss  the  nonlinear  term  in  (10),  which  can  be  shown  to  be 

nonlinear  term  =  ^h2n(  1  +  r~1)(f")n_lyn_l>n_2yn_lin_2 

where  (/")„_ i  is  some  average  value  of  /".  It  seems  quite  possible  that  this  could  be  a 
significant  part  of  the  stiff  error  for  some  problems.  It  can  be  shown  that  the  contribution  of 
the  nonlinear  term  to  the  residual  can  be  computed  at  a  cost  of  one  function  evaluation  (and 
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estimated  for  practically  nothing).  It  is  a  question  whether  the  nonlinear  error  has  a  special 
structure,  and  whether  stepsize  control  should  ensure  that  the  nonlinear  error  is  small. 

Recently,  a  different  stepsize  control  strategy  has  been  proposed  in  [14],  which  for  our 
example  would  mean  using  hnynn_i  for  the  local  error.  This  is  consistent  with  current 
practice  in  nonstiff  solvers  where  the  stepsize  is  controlled  for  a  formula  that  is  one  order 
lower  than  that  actually  used.  The  foregoing  analysis  of  the  predictor  error  simplifies  in  this 
case. 

We  conclude  this  section  by  deriving  (10).  Using  Eqs.  (6n_2)2,  (6n_i),  and  (8n)  to 
determine  Fn_2,  Fn_i,  and  F*  with  y £  eliminated  by  Eq.  (7),  we  get 


and 


where 


F%  -  Fn_ i  =  z{tn)  -  z(tn—  i ) 


F n—  1  ~ '  Fn_  2  —  z[tn_  i)  2(fn_2)  (l/n— l,n  — 2  2/n-2,n-3) 


(12) 

(13) 


z{i)  —  /(j/n- 1  (t  tn_i)l/n_iin_2). 

Hence,  using  (12)  and  (13)  to  create  second  divided  differences,  we  get 


Fn,n- l,n-2  —  z[^nj  ^n-1  j  ^n-2  —  (^n  T  ^n-l)  (1  ~l~  rn-l  )l/n-l,n-2,n-3 


I  dt 


We  can  express  the  divided  difference 

z[tn,tn-l,tn-2}=  [  K{t)z"{t)i 

where  the  Peano  kernel  K{t)  is  the  hat  function  for  t  -  fn_!  divided  by  hn  +  hn_ x.  If  we 
define 

(/w)n-l  =  ^  /  K(t)fw(yn-1  +  (^  ~  tn-l)yn-l,n~2 )  dt 
and  use  (9),  we  get 

•^n,n-l,n-2  =  2  (/w  )n-  1  Vn-ltti-iVn- l,n-2  —  (^n  +  ^n-l)  (1  +  T  n_l)hn_llen-\ 

from  which  (10)  follows. 


2The  notation  means  that  in  (6,  n  is  replaced  by  n  -  2,  and  similarly  in  the  next  two  references. 
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4  Iterative  Linear  Solution  Methods 


In  this  section,  iterative  methods  will  be  sketched,  focusing  on  the  Chebyshev  method  and 
Richardson’s  method.  Technicalities  will  not  be  emphasized. 

4.1  Krylov  Subspace  Methods 

Let  Ax  ~  b  be  a  linear  set  such  as  in  (3).  Let  x^0)  be  an  initial  guess  and  :=  b  —  Ax^ 
be  the  initial  residual.  The  iterative  methods  that  will  be  considered  here  are  such  that  the 
solution  approximation  is  of  the  form  x^‘^  =  x^+y,  where  y  £  V{  span{r^°\  . . . ,  A,_M0)}, 
and  Vt  is  the  Krylov  subspace.  Such  methods  are  called  Krylov  subspace  methods.  (Another 
widely  used  term  for  Krylov  subspace  methods  is  polynomial  methods.)  They  include  and 
unify  a  large  class  of  methods,  such  as  the  conjugate  gradient  method,  and  for  nonhermitian 
matrices,  the  adaptive  Chebyshev  iteration  of  Manteuffel  [17],  and  Richardson’s  method 
together  with  GMRES  [21],  ORTHOMIN  and  other  conjugate  gradient-like  methods. 

4.2  The  Residual  Polynomial 

Let  x M  be  the  ith  approximation  resulting  from  an  iterative  method.  Also  let  eW  =  x  —  x^ 
and  A1)  =  b  —  Ax^  be  the  error  and  residual  respectively.  For  a  Krylov  subspace  method, 
the  error  e^  is  related  to  the  initial  error  by  a  residual  polynomial  R,  defined  by  e ^  = 
jR,(A)e(0h  The  name  results  from  the  fact  that  the  same  polynomial  propagates  the  residual, 
A*)  =  R,(A)A (The  GMRES  and  ORTHOMIN  methods  are  restart  methods  for  which 
may  be  the  initial  error  due  not  to  the  initial  approximation  of  Ax  =  b  but  to  some 
later  approximation.  Also  the  adaptive  Chebyshev  method  of  Manteuffel  [17]  is  a  restart 
method.) 

4.3  Types  of  Acceleration 

Methods  differ  in  the  way  that  the  residual  polynomial  is  determined.  Thus,  in  conjugate- 
gradient-like  methods  the  residual  polynomial  minimizes  a  certain  weighted  norm  of  the  error. 
In  the  Chebyshev  method  the  polynomial  is  chosen  as  that  polynomial  minimizing  a  uniform 
norm  over  an  ellipse  among  all  residual  polynomials  of  a  fixed  degree  and  turns  out  to  be 
a  scaled  and  translated  Chebyshev  polynomial.  There  are  practical  differences  among  these 
methods  of  course.  Conjugate-gradient-like  methods  determine  their  residual  polynomial 
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in  an  automatic  way  whereas  the  Chebyshev  method  requires  a  set  of  parameters,  which, 
nevertheless,  can  be  computed  in  an  effectively  automatic  way  by  means  of  the  Manteuffel 
algorithm  [17]. 

The  Manteuffel  algorithm  chooses  parameters  based  on  an  estimate  of  the  convex  hull  3 
of  the  eigenvalues,  beginning,  for  example,  with  a  point  known  to  be  inside  the  convex  hull. 
After  every  20  or  so  iterations  the  convex  hull  is  expanded  (and  the  parameters  recomputed) 
to  include  estimates  of  several  stray  eigenvalues.  The  combination  of  the  Chebyshev  method 
with  the  Manteuffel  algorithm  will  be  referred  to  as  the  adaptive  Chebyshev  method  (of 
Manteuffel). 

One  advantage  of  the  adaptive  Chebyshev  method  is  as  follows.  The  solution  of  stiff 
IVPs  yields  a  succession  of  linear  systems  to  solve,  each  one  often  not  much  different  from 
the  previously  solved  problem.  This  is  certainly  true  for  the  linear  systems  arising  from 
successive  Newton  iterations.  From  one  time-step  to  the  next  there  may  be  a  significant 
change  in  A  —  j^I  -  f  due  to  changes  in  h,  but  it  is  straightforward,  in  the  absence  of 
preconditioning,  to  calculate  what  this  does  to  the  eigenvalues  of  the  matrix  A.  The  idea 
then  is  that  eigenvalue  estimates  from  one  linear  system  can  be  used  as  initial  estimates 
for  the  next  system.  However,  because  the  algorithm  works  only  by  expanding  the  convex 
hull  approximation,  it  would  be  important  to  begin  the  solution  of  a  new  linear  system  by 
suitably  shrinking  the  final  convex  hull  from  the  previous  system  so  that  it  lies  inside  the 
true  convex  hull. 

4.4  Richardson’s  Method 

The  Chebyshev  method  minimizes  the  residual  polynomial  over  an  ellipse,  which  will  be 
called  the  Manteuffel  ellipse ,  containing  the  convex  hull  (approximately)  of  the  spectrum  of 
the  system  matrix  A.  If  the  convex  hull  is  not  well-approximated  by  the  enclosing  Manteuffel 
ellipse,  then  one  might  want  to  consider  an  iterative  method  for  which  the  residual  polynomial 
is  minimized  over  the  convex  hull.  If  the  convex  hull  is  not  elliptical  a  second  order  iteration 
is  not,  in  gen.ral,  possible.  (A  second  order  iteration  is  one  for  which  the  iterate  is  optimum 
at  each  step;  the  Chebyshev  iteration  is  an  example.  Second  order  iterations  require  that 
the  residual  polynomial  be  generated  by  a  three  term  recursion.  In  general  a  three  term 
recursion  does  not  exist.)  However,  a  first  order  method  is  always  possible.  A  first  order 
3The  convex  hull  of  a  set  of  points  is  the  intersection  of  the  convex  sets  containing  the  set  of  points. 
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method  is  often  called  Richardson’s  method.  (For  a  statement  of  Richardson’s  method  in 
algorithm  form,  which  is  not  necessary  for  this  discussion,  see  [13].) 

It  should  be  noted  that  Richardson’s  method  does  not  require  working  with  a  convex 
hull;  a  general  nonconvex  set  containing  the  eigenvalues  may  be  used  instead.  A  reason  for 
using  the  convex  hull  is  that  it  may  be  computed  from  known  procedures,  whereas  there  is 
no  known,  general  procedure  for  computing  a  non-convex  region  containing  the  spectrum. 

If  Richardson’s  method  is  used,  the  parameter  difficulty  is  much  the  same  as  for  the 
adaptive  Chebyshev  algorithm  of  Manteuffel.  In  fact  if  the  residual  polynomial  is  a  scaled 
and  translated  Chebyshev  polynomial,  then  Richardson’s  method  is  just  a  first  order  imple¬ 
mentation  of  the  second  order  Chebyshev  method  used  in  the  Manteuffel  algorithm. 

In  [23],  an  adaptive  algorithm  is  presented  for  Richardson’s  method  that  determines  the 
convex  hull  of  the  spectrum  in  a  way  analogous  to  the  Manteuffel  algorithm.  It  should  be 
noted  that  the  adaptive  method  in  [23]  is  a  combination  of  two  iterative  styles,  one  from  the 
GMRES  method,  used  both  to  advance  the  solution  and  compute  an  approximate  (convex 
hull  of  the)  spectrum,  and  the  other  from  Richardson’s  method.  This  combination  was 
suggested  in  [6]. 

One  final  note  on  Richardson’s  method:  The  simplicity  of  the  method  is  an  advantage 
on  advanced  processors  when  organizing  the  computations  to  minimize  data  traffic  [22], 

4.4.1  Minimizing  the  L2  Norm  of  Residual  Polynomials 

If  the  convex  hull  is  determined,  then  one  can  determine  the  residual  polynomial  in  an 
optimum  way  to  satisfy  a  weighted  L2  norm  induced  from  the  inner  product, 

(g,h)w  =  Jrg(\)h(\)w(\)\d\\  (14) 

where  T  is  a  contour  in  the  complex  plane,  the  boundary  of  the  convex  hull  of  the  spectrum 
in  the  application  to  the  Richardson’s  method  parameter  problem. 

The  L2  optimal  residual  polynomial  of  degree  k  is  defined  to  be  the  solution  of  the  least 
squares  problem  (Rk,  Rk)w  =  minimum.  There  are  methods  [24]  to  compute  the  roots  of  this 
polynomial  related  to  the  stable  algorithm  of  Golub  and  Welsch  for  the  computation  of  nodes 
and  weights  for  Gaussian  quadrature  [11],  The  reciprocals  of  the  roots  of  the  polynomial 
then  become  the  parameters  required  for  Richardson’s  method. 
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4.4.2  Minimizing  the  Uniform  Norm  of  Residual  Polynomials 

In  the  literature  recently  there  have  been  ideas  developed  for  the  practical  computation  of 
Richardson’s  method  parameter  for  which  the  residual  polynomial  is  approximately  min¬ 
imized  in  the  uniform  norm  [7,  8,  19,  26].  These  methods  use  the  theory  of  conformal 
mapping  and  the  properties  of  Faber  polynomials.  The  resulting  residual  polynomial  may 
be  called  here  the  L oo  optimal  residual  polynomial.  An  important  contribution  from  [8,  26] 
is  an  algorithm  for  ordering  the  parameters  for  Richardson’s  method  to  ensure  stability. 

In  the  adaptive  algorithm  in  [23],  an  L2  optimal  residual  polynomial  is  computed  and 
used  but  this  is  not  a  restriction  and  the  L0 0  optimal  residual  polynomial  could  be  used 
instead. 

4.5  The  Appropriate  Iterative  Method  for  IVPs 

The  nature  of  the  error  reduction  needed  to  solve  the  IVP  was  briefly  explained  in  §3. 
Equation  (11)  shows  that 

%  =  e'  +  A-'e" 

where  A  —  I  —  hf,  He'll  <  3 and  ||e"||  <  2e.  From  this  expression,  one  sees  that  the 
nonstiff  error  must  be  reduced  by  as  much  as  1/(3  +  2 4>~x)  and  the  stiff  error  by  as  much  as 
1/3.  A  simplified  and  satisfactory  conclusion  to  draw  from  this  is  that  the  stiff  and  nonstiff 
errors  should  be  damped  uniformly.  Thus  if  the  spectrum  is  well  approximated  by  an  ellipse, 
the  adaptive  Chebyshev  method  of  Manteuffel  is  reasonable.  If  it  is  not,  one  may  want  to 
consider  Richardson’s  method.  If  the  residual  polynomial  is  determined  by  minimizing  the 
L2  norm  of  the  residual  polynomial  derived  from  tne  inner  product  (14),  then  a  reasonable 
choice  for  the  weight  is 

w{\)  =  1. 

It  should  be  noted  that  this  is  net  the  Chebyshev  weight  function,  which  is  the  weight 
function  that  should  be  selected  if  the  convex  hull  is  an  interval  (of  positive  numbers),  and 
so  causes  the  L2  norm  to  behave  like  the  uniform  norm  only  approximately. 

In  using  either  the  conjugate  gradient  method  in  the  hermjtian  positive  deimite  case 
or  conjugate  gradient-like  methods  in  the  nonhermitian  case  a  weight  is  imposed  on  eigen- 
components  of  the  error  equal  to  the  magnitude  of  the  corresponding  eigenvalue.  As  argued 
here,  such  a  weight  function  is  not  appropriate. 
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5  Accelerating  the  Newton  Iteration 

From  §2,  the  Newton  iteration  is,  for  m  >  0, 

»tm+1)  =  (15) 

where  c  is  an  acceleration  parameter  and 

F(y)  :=  r-y  -  f(y)  -  r-yn- i-  (16) 

/in  /in 

(Recall  that  the  solution  of  F( y)  =  0  is  denoted  by  y  =  y*.)  A  sequence  of  acceleration 
parameters  will  be  derived  by  interpreting  iteration  (15)  as  a  preconditioned  Richardson’s 
iteration.  With  an  assumption  of  linearity,  acceleration  parameters  may  then  be  derived  by  a 
straightforward  application  of  standard  adaptive  iteration  parameter  techniques  [17,  13,  23] . 
Other  approaches  are  possible,  for  example,  [15].  An  extended  treatment  of  the  solution  of 
nonlinear  equation  by  Krylov  subspace  methods  is  given  in  [2]. 

5.1  Krylov  Subspace  from  a  Nonlinear  Operator 

A  subspace  will  be  defined  that  is  generated  by  the  preconditioned  nonlinear  operator,  G~lF, 
appearing  in  Newton’s  method.  A  linear  approximation  to  the  nonlinear  operator  allows 
the  subspace  to  be  interpreted  as  a  Krylov  subspace.  In  turn,  the  Krylov  subspace  yields 
acceleration  parameters  based  on  Chebyshev  polynomials. 

5.1.1  Linear  Approximation 

Let  M  be  the  Jacobian  of  G~l  F  evaluated  at  y^\  M  =  G~l F'(y^).  Matrix  M  is  only  an 
aid  to  exposition,  and,  of  course,  is  never  computed.  The  true  assumption  is  not  that  the 
Jacobian  is  evaluated  at  y ^  but  that  the  Jacobian  is  slowly  varying  during  some  portion  of 
the  Newton  iteration.  If  so,  then  the  point  at  which  M  is  said  to  be  the  Jacobian  of  G~l  F 
is  arbitrary,  and,  in  particular,  the  choice  y^  is  arbitrary. 

Since  G  is  the  Jacobian  of  F  evaluated  at  some  yjj1  \  it  would  follow  that  M  ^  I  unless 
y^T*  ^  -  y^ .  Therefore,  assume  that  ^  ^  y^ . 

5.1.2  Newton’s  Method  as  Richardson’s  Method 

Given  an  approximation  y ^  to  the  solution  of  G~lF(yn )  =  0,  let 

7-(m)  =  -G~lF{yW) 
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be  the  residual  due  to  y Equation  (15)  becomes 


y<r>  =  si”-'1  + 

(17) 

Let 

e(m)  =  s;  -  tf"1- 

Subtract  (17)  from  y*  =  y*  to  obtain 

e(m)  =  e(m-i)  _  cr(™-i). 

(18) 

Next,  since 

-  G~lF{yim))  «  Me'™) 

it  follows  from  (18)  that 

e(m)  a  (7-cM)e(m_1). 

(19) 

Therefore 

e<m)  a  (7  -  cA7)me(0). 

(20) 

Multiply  (20)  by  M  to  obtain 

r<m)  a  (7  -  cA7)mr(0). 

(21) 

5.1.3  Krylov  Subspace 

As  the  Newton  iteration  proceeds,  a  Krylov  subspace,  14,  is  defined  by  7  -  cM 

H:=.pan {r<°>, cM)*-1/0*}. 


Although  14  is  not  computed,  the  subspace 


is  available  and 


Vfc  :=  span{r(,)}-=o 

14  %  Vk. 


Computations  requiring  14  can  be  performed  using  14. 
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5.2  Chebyshev  Acceleration  Parameters 

From  the  k  vectors  in  14,  k  —  1  eigenvalues  of  M  may  be  estimated.  (It  should  be  noted 
that  in  practice,  A:  is  a  small  number;  in  the  code  DASSL,  the  total  number  of  Newton 
steps  is  limited  to  4.)  From  these  eigenvalues,  the  convex  hull  of  the  spectrum  of  I  —  cM 
can  be  computed,  which  yields  [17]  two  parameters  d  and  ceiiipse  from  which  a  sequence 
of  acceleration  parameters  c  may  be  computed.  Parameter  d  is  the  center  of  a  family  of 
confocal  ellipses  for  which  the  foci  are  d±cciUpse.  The  Manteuffel  algorithm  determines  these 
parameters  in  order  to  optimize  convergence  of  the  Chebyshev  method. 

The  d  and  ceuipae  parameters  yield  a  sequence  of  Chebyshev  parameters  to  use  as  follows. 
Let  k  be  the  total  number  of  Newton  steps  to  be  executed.  Of  course  k  is  not  known; 
however,  for  the  moment,  assume  that  it  is.  Let  cm_!,  m  =  k  -f  1, . . .  ,  /,  1  <  k  be  the 
reciprocals  of  the  roots  of  the  polynomial 

Ci-kW  =  T^k  (— - — 'j 

\  ^ellipse  / 

where  T]  is  the  Chebyshev  polynomial  of  degree  j. 

In  place  of  (17),  execute 


(m) 

n 


=  y 


(m-l) 

n 


(22) 


for  m  =  k  -f  1, . . . ,  /. 

There  is  an  obvious  difficulty  to  resolve:  the  number  l  must  be  determined  in  advance, 
and  must  satisfy  /  <  k,  where  «  is  unknown.  One  approach  is  this.  Choose  some  fixed  l  and 
compute  the  Chebyshev  parameters.  It  is  not  expected  that  /  =  k  so  that  either  there  are  too 
few  parameters  cm  to  get  to  Newton  step  k  or  there  are  too  many.  If  there  are  not  enough, 
then  recycle  the  old,  computed  parameters  cm,  in  which  case  eventually  either  there  will  be 
unused  parameters  cm  or  all  parameters  will  have  been  used.  Thus,  plans  must  be  made 
for  unused  parameters.  To  prepare  for  too  many  parameters,  order  c*, . . . ,  c/_ x  by  either  of 
the  methods  of  [26]  or  [8],  which  exploit  the  known  mapping  of  Chebyshev  polynomial  roots 
onto  the  unit  circle  to  obtain  an  ordering  for  which,  heuristically,  ]|r^m^ ||  <  ||^rn“1^||-  (These 
ordering  techniques  are  more  general,  however,  and  are  not  restricted  to  the  Chebyshev 
case.)  With  this  ordering  one  may  reasonably  choose  l  —  k  =  4,  even  though  this  number  of 
parameters  is  not  likely  to  be  reached  within  a  single  time  step  by  a  solver  such  as  DASSL. 


13 


Another  approach  to  the  problem  of  excessive  parameters  cm  is  to  use  the  second  order 
formulation  of  the  Chebyshev  iteration.  The  following  is  an  adaptation  of  the  algorithm  in 
[17].  Recall  that  the  residual  Ak)  is  F(y^). 

1.  Set  Qi+1  :=  1/d. 

2.  Set  AyW  :=  ak+1Ak\ 

3.  Set  :=  y£)  +  AyW 

4.  Set  Ak+1)  :=  F(y(fc+1)). 

5.  Do  for  m  =  k  +  2  by  1  until  convergence: 

5.1  Set  um  :=  l/([d  -  4liP]am-i/4). 

5.2  Set  7m  :=  dctm  -  1. 

5.3  Set  ;= 

5.4  Set  y£")  :=  y^~l)  +  Ay^"1). 

5.5  Set  r(m)  := 

Enddo 

For  another  application  of  Richardson’s  method  to  nonlinear  problems,  see  [10,  12]. 

5.3  Changing  the  Stepsize 

Changing  the  stepsize  causes  a  potentially  large  change  in  the  Jacobian  of  F.  The  effect  of 
this  on  the  Newton’s  method  Chebyshev  parameters  will  be  estimated  under  the  assumption 
that  no  preconditioning  was  used  at  step  n  to  solve  the  linear  systems  with  matrix  G. 
Additional  assumptions  are  (i)  the  matrix  G  does  not  change;  (ii)  the  solution  of  the  linear 
systems  at  step  n  with  matrix  G  has  been  by  the  Manteuffel  adaptive  Chebyshev  algorithm, 
which  yields  the  convex  hull  of  G\  (iii)  G  was  not  preconditioned  during  the  Chebyshev 
iteration;  and  (iv)  that  the  eigenvectors  of  F'(y„l i)  and  F'^yl^1^)  are  approximately  the 
same. 

5.3.1  Operator  Approximation  at  Step  n  +  1 

We  outline  a  technique  to  apply  the  work  to  compute  the  parameters  at  step  n  to  the 
computation  of  the  parameters  at  step  n  +  1. 

At  this  point  it  is  convenient  to  denote  the  dependence  of  F  and  M  on  n  by  Fn  and  Mn. 
The  acceleration  parameters  Cm  are  determined  by  the  convex  hull  of  approximations  to 
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eigenvalues  of 


Af„+.  =  G-'K+.foS,)- 


Consider  the  expression  on  the  right.  Recall  that 

g  =  -  /'(ji?'1). 

The  other  factor  on  the  right  of  (23)  is 


Cm  (sS.)  =  U,/-/'(yS.). 


Therefore, 


m„m  =  -  /'(»'7'))|  '  | KU  -  /UZ)j  ■ 


■i  T  _  fV,/0)  'd 


5.3.2  The  Convex  Hull  at  Step  ra  -f  1 

The  convex  hull  of  Mn+1  in  (24)  may  be  estimated  in  terms  of  known  quantities  as  follows. 


It  is  assumed  that 


/'(s$.)  =  f'(y  i.0)). 


1  J_ 

^n+1 


From  (25)  and  (26),  (24)  becomes 


M„m  »  [ h'~‘I  -  /'feJT))"1  \yJ  +  K'l  -  /'(si”1)]  (27) 

=  p.„G'  +  Mn.  (28) 

The  convex  hull  of  Mn+i  is  needed.  The  eigenvectors  of  f(y jf))  and  f(y ^)  are  assumed 
to  be  approximately  the  same.  It  follows  that  the  eigenvectors  of  Mn  and  G  are  approximately 
the  same.  Therefore,  if  the  convex  hull  of  each  matrix  on  the  right  of  (28)  were  known  then 
an  approximate  convex  hull  of  the  sum  could  be  easily  computed.  The  convex  hull  of  Mn 
is  assumed  known  from  the  preceding  time  step.  It  remains  to  determine  the  convex  hull  of 
G_1  approximately. 

The  adaptive  Chebyshev  method  was  assumed  to  have  been  used  to  solve  the  linear 
systems  at  the  preceding  step,  n,  for  which  the  matrix  is  G.  4  The  convex  hull  of  G  is 
therefore  known.  An  approximation  to  the  convex  hull  of  G-1  is  therefore  easily  obtained. 


4Note  that  this  application  of  the  adaptive  Chebyshev  method  is  in  the  execution  of  the  inner  iteration 
rather  than  the  outer  Newton  iteration. 
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6  Summary 


In  this  paper,  a  simplified  analysis  has  been  presented  for  the  predictor  error.  It  suggests 
that  when  using  an  iterative  method  for  the  inner  iteration  (with  Newton’s  method  or  a 
variant  of  Newton’s  method  as  the  outer  iteration),  the  iterative  method  should  be  one  for 
which  components  of  the  error  are  damped  approximately  uniformly.  Two  classes  of  iterative 
methods  are  sketched  for  which  the  error  is  damped  in  a  uniform  way,  the  adaptive  Chebyshev 
method  of  Manteuffel  and  Richardson’s  method.  Richardson’s  method  is  preferable  if  the 
spectrum  of  the  inner  iteration  matrix  is  not  well  approximated  by  an  ellipse. 

If  the  (modified)  Newton’s  method  step  is  viewed  as  one  step  of  Richardson’s  method, 
the  usual  techniques  for  computing  iteration  parameters  adaptively  may  be  used  to  enhance 
convergence  of  Newton’s  method  with  a  parameter  sequence.  In  the  case  when  no  precondi¬ 
tioning  is  used  for  the  inner  iteration,  the  parameters  may  be  recomputed  at  low  cost  when 
the  stepsize  changes. 
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